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Abstract 



On the basis of order statistics, iBakerl (120081 ) proposed a method for con 



structing multivariate distributions with fixed marginals. This is another 
representation of the Bernstein copula. According to the construction of 
Baker's distribution, the Bernstein copula can be regarded as a finite mix- 
ture distribution. In this paper, we propose expectation-maximization (EM) 
algorithms to estimate the Bernstein copula function, and prove the local 
convergence property. Moreover, asymptotic properties of the proposed semi- 
parametric estimators are provided. Illustrative examples are presented using 
real datasets. 

Keywords: Baker's distribution, Bernstein polynomial, Density estimation, 
Linear convergence, Order statistic, Ordered categorical data 

1. Introduction 



On the basis of order statistics, IBakerl (120081 ) proposed a simple and intu- 



itive method for constructing multivariate distributions with fixed marginals. 
Her idea in the case of bivariate distributions is stated as follows: Let 
Xi, . . . , X m and Y\, . . . , Y n be two independent i.i.d. samples from distri- 
butions F and G, respectively. The marginal (cumulative) distribution func- 
tions F(x) and G(y) can be both continuous and/or discrete. By sorting 
the two samples, we obtain the order statistics Xm < ■ ■ • < A( m ) and 
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Y{i) < • • • < ^"(n)- Furthermore, independently of X k s and Y^'s, we choose 
uniform random numbers K and L from {1, . . . , m} and {1, . . . , n}, respec- 
tively. Obviously, the marginal distributions of X^k) and Yjx) are F and G, 
the same as those of Xk and Y\. The joint distribution of (X^, Y( L \), the 
pair of the Kt\i and Lth smallest order statistics, is called Baker's bivariate 
distribution. Multivariate Baker's distribution can be defined in the similar 
way. 

According to the above constructive definition, we see that Baker's bivari- 
ate distribution is parameterized by an m x n matrix parameter R = (r k i), 
where 

r kl = Pi(K = k,L = l), 1 < k < m, 1 < / < n. 
Since the marginal distributions of K and L are uniform, R must satisfy 

n ^ m 

y^ r ki=—, y^ r ki = -, r kl >o. (i) 

1=1 k=l 

If rfc/ = l/(mn), that is, if .fT and L are independent, then Xik) and Y(n are 
also independent. Otherwise, K and L are not independent and (X^,Y^) 
becomes a correlated bivariate random variable. 

Let F k:m (x) and Gi :n (y) be the distribution functions of the order statistics 
X(j~) and Ym, respectively. Baker's distribution is a finite mixture distribution 
of mn-components with distribution function 

m n 

H(x,y; R) = Pr(X {K) < x, Y {L) < y) = ^^r w F fc:m (z)G, :n (y). (2) 

k=i i=i 

It is known that distribution functions of order statistics can be described 
in terms of the Bernstein polynomials. Let the Bernstein polynomial and its 
integral be 



uV(l - u) n ~\ B k>n (u) = £b k>n {t)dt, u G [0, 1], 



respectively. Then, the distribution functions F k:m (x) and Gi :n (y) are ex- 
pressed as F k . m (x) = mB h ] m _A F(x)) and G hn {y) = nB l _ ln ^ 1 {G{y)) (see, 



e.g., (1) in iHwang and Linl ( 1l984l )). Substituting these into (E]), we have 



H(x,y;R) = C(F(x),G(y);R) 1 (3) 
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where 



C(u, v; R) = ran 



m n 

EE 

k=l 1=1 



TklBk- 



l,m-l 



'u)B. 



l-l,n-l\ 



G[0,1] 2 . (4) 



By Sklar's theorem, any bivari ate d i stribu t ion can be rep resented by a 
copula and its marginals (see, e.g., IJod (120011 ). iNelsonl (120061 ) ). Recall that 



the copula can be an arbitrary bivariate distribution function on [0, l] 2 whose 
marginals are the uniform distribution on [0, 1]. In fact, C(u,v; R) in (J4J) is 
a copula, and so Baker's distribution can also be expressed by the copula 
function with arguments F and G and parame ter R, as shown in ([3D. In thi s 
paper, we call C(u, v\ R) the Bernstein copula (ISancetta and Satchelll . I2004J ). 

Because of the flexibility of copula functions, they are useful to describe 
multivariate distributions, and many copula functions have been proposed. 
Among them, the Bernstein copula has two remarkable features. One is 
that thanks to the Weierstrass approximation theorem, any copula density 
(the density function of absolutely continuous copula) can be approximated 
uniformly on [0, l] 2 by the Bernstein densit y c{u, v\ R) = w ^qzC{u, v; R) when 
m and n are sufficiently large (see, e.g., iKingslevI (il95lf )^ Therefore, the 
Bernstein cop ula can approximate any co ntinuous bivariate distributions. 
On this basis, ISancetta and Satchelll ( 120041 ) proposed an empirical Bernstein 
copula density estim ator and studied its consistency in mean square error. 



Janssen. et al.l (120121 ) investigated the empirical Bernstein density estimator 



on its almost sure consistency rates and asymptotic normality. 

Another feature of the Bernstein copula is that it is a finite mixture 
distribution as stated above. Following the definition of Baker's distribution, 
it is easy to generate random numbers. This has practical importance because 
the copula is used not only for analyzing existing data, but also for making 
predictions by Monte Carlo simulation. Moreover, using the property of finite 
mixture distribution, the expectation-maximization (EM) algorithm can be 
used to estimate parameters. In this paper, we propose estimation methods 
based on this idea. 

The remainder of this paper is organized as follows. In Section |2} EM 
algorithms for estimating parameters are proposed for various settings. We 
also prove a local convergence of our EM algorithm. Moreover, asymptotic 
properties of the proposed estimators as semiparametric estimators are pro- 
vided. In Section [31 we give illustrative examples based on Baker's distribu- 
tion models with the proposed algorithms using real datasets. Mathematical 
details are given in the Appendix. 
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2. EM algorithms based on pseudo-likelihood 



2.1. Continuous case 

As we have seen, Baker's distribution or the Bernstein copula function 
is a finite mixture distribution, and hence EM algorithm methodology ca n 
be applied for maximum likelihood estimation (IMcLachlan and Peell . l2000l ). 
Throughout this paper, we assume that marginal distributions F and G are 
estimated in advance and treated as known functions in the subsequent anal- 
ysis. This two-stage estim ation is referred to as t he semiparametric metho d 
and is wid e ly used (e.g., Genest. et all ( 1995 ). Charpentier. et al. ( 2007 ). 
Kim, et all f l2007h . IChoros. et all ( bold )). In this paper, F and G are esti- 
mated as N/(N + 1) times the marginal empirical distributions Fn and Gn 
of X and Y (where iV is the sample size), and their density functions / and 
g, if they exist, are estimated with kernel estimators (see Section [3]). The 
likelihood function with F, G, f and g replaced by their estimators is called 
the pseudo-likelihood function. 

In this subsection, we suppose that F and G are absolutely continuous 
with densities / and g, respectively. The density functions of their fcth and 
Zth smallest order statistics with sample sizes m and n can be written as 



d 

dx 
d 



Fh-m. \X\ 



mb 



k— l.m— 1 



(F(x))f(x), 



(5) 



9i-.n(y) = -r-Gi.. n (y) = nbi^ 1 ^ 1 (G{y))g{y). 
dy 

The density of Baker's bivariate distribution then can be written as 



h(x,y;R) = ^2^2r kl f k:m (x)g bn (y). 

k=l 1=1 

Using the copula density 

m n 

c(u,v;R) = mnj^ J^r w 6 fc _i )m _i(u)6i_ ljn _i( 



k=l 1=1 



(6) 



density has another expression 

h(x,y;R) = c(F(x),G(y);R)f(x)g(y). 

Suppose that an i.i.d. sample (xi,yi), i = 1,...,N, is obtained from 
Baker's distribution (E]). According to the standard method for estimating 
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a finite mixture distribution, we introduce a pair of unobserved variables 
(Ki, for observation i, with probability Pr(Ki = k, Li = I) = r k i, k G 
{1, . . . , m}, I G {1, . . . , to}, i = 1, . . . , N. We also define an m x to matrix 
t% = (t^m) as a dummy variable with elements 

(l ((K i ,L i ) = (k,l)), 

Note that Tj and (i^, Lj) are one-to-one. The likelihood for the full dataset 
(x u yi, Ti), i = 1, . . . , N, is given by 

N m n 

YlYlYl{rklfk: m (Xi)gi:n(yi)} T%M - (8) 

i=l fe=l Z=l 

The E-step in the EM algorithm calculates the conditional expectation given 
1, . . . , N, that is, 

Ti,ki = E[r iM | (xi,yi);R\ 
_ r k ifk:m{xi)gi :n (yi) 
h(xi,yi]R) 
r k ib k _i, m _i{F{xi))bi_i. n _i(G{yi)) 

c(F( Xi ),G(yi);R) 



(9) 



The M-step maximizes the logarithm of the likelihood (jSJ) by assuming r^jy = 
r^fci. The logarithm of the expectation of dSJ) divided by iV is 

- TV m n m n 

j^^^^n M \og{r k if k , m (xi)gi, n (yi)) = f H log r w + const., (10) 

i=l fc=l !=1 k=l 1=1 

where f w = J2f=in,ki/ N - 

Maximizing ffTUl) is a convex problem and has a maximizer R* = (r^), 
because ffTUl) is a proper concave function in r k i and the region of R = (r k i) 
defined by (TTJ) is convex. Moreover, if f k i > for all k, I, the maximizer R* is a 
(relatively) interior point of the region ([1]). In such the maximizer R* 

is obtained by the Lagrange multiplier method under the conditions r k i = 
l/ m > Ylik r ki = l/ n - We introduce Lagrange multipliers and A;, and 
maximize 

L =itiL fki io s - e ^ fe ^ - ^) - e a ' fe ^ - n") 

fc=l !=1 fc ^ I ' I ^ k ' 
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with respect to r k i, and A/. Then, the maximizers r kl , /i£ and A^ are 
obtained as the solution of dL/dr k i = f k i/r k[ — fi k — \ L = and J2i r ki = l/m, 
J2k rkl = V 72 - To find fi* k and A* satisfying 

r k i = — ^— (11) 

as well as the restriction (pQ), we propose the following procedure: 

Algorithm 2.1. Step MO. Set nf = A[ 0) = 1/2. 

Step Ml. For fixed /x = (/ii, . . . , /i m )', find A = (A;) numerically from 

m 1 

V_*L*, l<Z<n. 
^ /i fc + A; n 

S^ep M2. For fixed A = (Ai, . . . , A n )' ; find n = (fi k ) numerically from 
n - ^ 

/ — = — , 1 < k < m. 

^ /j, k + Xi m 

Step M3. Update n = {fi k ) by 

to ■= to--[J2to-J2^) 

H-=l k=l ' 

Repeat Steps M1-M3 until 07]) converges. 

The following proposition states that Algorithm 12.11 converges locally, a 
proof of which is given in the Appendix. Empirical study suggests that this 
algorithm has the global convergence property, but the proof remains an open 
question. 

Proposition 2.1. Suppose that f k \ > 0. Then, Algorithm \2.1\ has the prop- 
erty of locally linear convergence. More precisely, the sequences of fj,^ and 
A^ (t = 0, 1, . . .) generated by Algorithm \2.1\ converge to the solution /x* and 
A* if the initial values /j,^ and X^ are close enough to n* and X* , and their 
convergence rates are fx® — /x* = 0(z/), A® — X* = 0(z/) (t — > oo) for a 
positive constants < v < 1. 

The EM algorithm is summarized as follows. 



1 < k < m. 
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Algorithm 2.2. Step 0. Set r^i = l/(mn). 
Step 1. Find T^ki by (Tj|) (E-step). 

Step 2. Update r ki by Algorithm\2J\ Step M0-M3 (M-step). 
Repeat Steps 1 and 2 until TiM converges. 

Note that this algorithm can be extended to Baker's distributions with 
three or more variables. 

Remark 2.1. In (TJ|) ; a common factor f{xj)g{yi) is canceled out in the 
numerator and the denominator. This is reasonable because the EM algo- 
rithm should be equivalent to the one based on the sample (F(xi),G(yi)), 
i — 1, . . . , N, having uniform marginals. 



Genest. et al.l ( 119951 ) developed an asymptotic theory for semiparamet- 



ric estimat i on of copula functions based on the pseudo-likelihood function. 



Tsukaharal (l2005f ) extended their results into M-estimation. Their results can 



be applied to our problem. Let 



rn n ^ 

c u (u,v;R) = mra^^rju— &*_!,„»_! (u)&i_i in _i(T;), 

k=l 1=1 
m n j 

c v (u,v;R) = mn >^ r fc ;& fc _ iim _i (u) — fy-i, n -i (v) 

k=l 1=1 

be the derivatives of c(u, v) with respect to u and v. To calculate these 
derivatives, we can use 

■J-h,n( U ) = n{b k -l, n -l(u) - b k , n -l(u)} (6_i n _i(li) = = 0). 

Proposition 2.2. Assume that the true value of the parameter R = (r k i) 
in (Tjp satisfies r k i > 0. As N goes to infinity, R is an asymptotically 
normally y/N -consistent estimator. A consistent estimator of NVar(R) = 
£ = (o~(ki),(k'i')) T mn x mn 'matrix with lexicographic index (hi) ) is given by 
S = B + SB + , where B and S are the sample covariance matrices of the 
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mn x 1 pseudo-observation vectors Ui = (ui( k n) an d v % — ( v i,(ki)) defined by 
mnb k _^ m _ 1 {F{x i ))bi_ 1)n _i{G{y i )) 



u 



i,(kl) 



Vi,(kl) =Ui,(kl) 



mn s~-^ 

~N ^ 



mn ^ b k ^ ltm _i{F(x j ))bi^ 1 ^i(G(y j ))c u (F(x j ),G(y j )\ R) 

c{F{ Xj ),G{ yj );RY 

hk-i,m-i{F{xj))bi- 1 , n -i{G{yj))c v {F{xj), G{y j )- } R) 

c(F( Xj ),G(vj);R) 2 



mn 



N 



r-Vi<Vj 



i = 1,...,N, respectively. B + is the Moore-Penrose pseudo inverse matrix 
ofB. 

Note that B is the observed Fisher information matrix when the marginals 
F and G are known. As most semiparametric estimators in statistical infer- 
ence, R is not an efficient estimator in the sense that its asymptotic vari- 
ance is larger than the one given by the Fisher information matrix when 
the marginals are known (unless r k i = 1/mn, i.e., c(u,v;R) = 1). See also 



Genest and Werkerl (120021 ) regarding the inefficiency of the FGM copula es- 
timator. 



Once the estimator R = (fjy) is obtained, where r k i = r k i = J2f=i %,ki/^ > 
the estimate of h(x,y;R) for a fixed point (x,y) is given as h(x,y;R) = 
Y^k=\ Y^=i^kifk:m(x)gi :n (y). Its asymptotic variance is evaluated as 



m n 



V&r(h(x,y;R)) « — fk-.m(x)f k '-.m(x)gi.. n (y)gi':n(y)^(ki),(k'i'), 



k,k'=l U'=l 



where SVfcn, is the ((kl), (k'l'))ih element of an mn x mn matrix E = 

(<7(fcZ),(fc'i'))- 

Remark 2.2. Sancetta and Satchel i (200 Jl) and Janssen, et al. (201k ) pro- 
posed to estimate r k i by 



r k i = Ut 



k-1 N k l-l N I . 

m N + 1 m n JS + 1 n 1 



i/ m and n go to infinity as the sample size N goes to infinity. This is a 
biased estimator in our case where m and n are fixed. 
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2.2. Discrete case 

The EM algorithm in Section 12.11 is also applicable when F and G are 
discrete distributions. Suppose that F has masses at a set A and G has 
masses at a set B. We assume that A and B are discrete sets. In particular, 
when A and B are finite sets, the data (a?i, i = 1, . . . , iV are summarized 
as an |A| x |£>| ordered categorical table (N ab ), where 

N ab = %{ie{l,...,N}\ (xi, yi ) = (a, &)}, aeA, beB. 

In this subsection, we modify the EM algorithm of Section [2~T1 so that Baker's 
distribution (J2J) can be applied to the data (N ab ) aG A,beB- 

The probability functions of X and Y are f(a) = Pr(X = a) = F(a) — 
F(a—) and g{b) = Pr(F = b) = G(b) — G(b—), respectively. The probability 
functions of kth and Ith order statistics Xha and are given by 

fk:m{o) =Fk:m{a) ~ F k:m (a—) 

=m{fl fc _ 1 , TO _ 1 (F(a)) - B^^Ffa-))}, 
gunih) =G l:n (b)-G l:n (b-) 

=ra{fl,_ 1 , n _ 1 (G(&)) - Bi_i tU _i(G(b-))}, (12) 

respectively. Using these, we have the joint probability function 

m n 

h(a,b;R) = Pr(X = a, Y = b) = } y r k if k . m (a)gi :n (b). 

k=l 1=1 

We introduce a dummy variable rj ab ,ki = J2i:( Xi , yi )=(a,b) wittl T iM defined 
in ([7]). The likelihood for the full data (JSJ) is rewritten as 



\;rkijk-.m\a)yi:n\y)\ 

a&A beB k=l 1=1 

The E-step for updating r) abjk i becomes 

N ab r k if k:m (a)gi m (b) 



Vab,ki = E[r] abM | (N ab );R] 



J2k=i EILi r k ifk:m{a)g hn {b) ' 



By letting = EagA E&eB ^a6,fc/, the M-step is given as the same form in 
Section EU that is, Step 2 of Algorithm E2 
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2.3. Mixed case 

Our approach can treat the case where one variable is continuous and 
the other one is discrete. For example, suppose that X ~ f(x) is continuous 
and Y ~ G(y) is discrete. Then, the density function of X^ = x and the 
probability function of Ym = b are given by fk-.m( x ) i n © and gi :n (b) in ffT2j) . 
respectively. The joint density function becomes 

Pr(Jf G (x x + <ix) y = 6) "\ n ^ 

b] R) = '—j^ ' = r klfk:m(x)gi:n(b) 

k=l 1=1 



m n 

mn^^6n, m -i(fW)/W{S,-i,„-i(G(6)) - £,_i,„_i ((?(&-))}■ 
fc=i i=i 

The E-step becomes updating 

•~ rn r I / \ T->"| r klfk:m{Xi)gi:n{yi) 

Ti,kl = E[T itk i | -RJ = 



h(xi,yi)R) 

_ r k ib k -i, m -i(F(xi)){Bi_ 1)n _i(G(yi)) - B^i^ 1 (G(y i -))} 

= EZiEtih-i, m -i(F^i)){Bi-i, n -i(G( m )) - B^-iiGfa-))}' 

and the M-step remains unchanged. 

2.4- The case where R is parameterized 

If R = (r k i) satisfying ([Q) is parameterized by a lower-dimensional param- 
eter 9 as r k i = rki(Q ) , the estimation becomes simpler. For the case of m = n, 



for instance, iBakerl (120081 ) discussed a subclass of bivariate distributions with 



a distribution function 

y; q, n) = (1 - q)F{x)G{y) + g-H^z, y) 

= (l-g)F(x)G( 2 /) + gC n ± (F(a ; ),G( 2 /)), < g < 1, (13) 



where 



1 n 

H+(x,y) = -y^F k:n (x)G k:n (y) = C+(F(x),G(y)), 
fc=i 

C+(w, u) = n>~^ B fe _i )Tl _i(«)JB fc _i jn _i(v), 



fc=i 
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and 



1 n 

H n( X iV) = - ^2 F toni x ) G n-k+l:n(y) = G~ (F(x), G(y)), 
k=l 
n 

C~(u,v) = n^S fc _ ljn _i(u)S n _ fc>n _ 1 (t;). 
k=i 

The densities of if* and C* are denoted by /i* and c*, if they exist. if *(x, y) 
(or H~(x,y)) is the largest positive (or negative) correlation case among 
Baker's distributions with m = n. The rank correlation of if* is ±(n — 
l)/(n + l). 

Function H ± (x,y;q ) n) is Baker's distribution with 



(1 - q)/n 2 + g<J w /n 



(for ff+), 



(1 - g)/n 2 + q6 kin -i +1 /n (for if " 



1 < kj < n, 



where 5jy is Kronecker's delta, is parameterized by the scalar parameter 
q. The parameter q adjusts the degree of independence. When q = 0, X 
and Y are independent. When q > 0, X and Y are positively (or negatively) 
correlated for the distribution H + (or H~). These models are expected 
to represent highly correlated distributions with fewer parameters than the 
original Baker's distribution. 

Originally, Baker's distribution was proposed as an extension to the FGM 
distribution with the limitation that its correlation does n ot exceed 1/3 
for the case of continuous marginals (jSchucany. et all . 119781 1 . Hence, the 



range of correlation of Baker's distribution gathers attention, and the largest 
correlation case if* (x ,y) is well studied. For the distribution H+(x,y), 
Lin and Huamj fl2010f ) investigated convergence conditions and the conver- 
gence rate of the maximal correlation going to the maximum (correlation 
of the Frechet-Hoeffding upper bound) as n tends to infinity. iDou. et al 



(120 121 ) proved the TP 2 property and derived the limiting distribution of 
(X,U), U = < Jn{F {X) - G(Y)), where (X, Y) are distributed as H+(x,y). 
Huang, et al.l ( 120131 1 proved that the copula function C + (w,t>) in the largest 



correlation case with u, v fixed is a non- decreasi ng func t ion in n. 

In modeling the joint distribution functions, iBakerl (120081 ) chose the pa- 
rameter q by minimizing the negative log-likelihood and the Kolmogorov- 
Smirnov statistic for a given set of n. Here we treat n as an integer-valued 
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parameter to be estimated, and as an alternative, we propose an EM al- 
gorithm below to estimate the parameters (q, n) simultaneously. Suppose 
that an i.i.d. sample i = 1,...,N, is obtained from the continuous 

distribution H£(x,y; q,n) with the density 

h+(x, y; q, n) =(1 - q)f(x)g(y) + qh+(x, y) 

={l-q + q( £(F(x),G(y))}f(x)g(y). 

Algorithm 2.3. Step 0. Set (q,n) = (1/2,1). 
Step 1. E-step: 



;i - q)f(xi)g(yi 



1-q 



;i - q)f(xi)g(yi) + qh±(x h y f ) l-q + qc±(F(xi), G(yi)) ' 



(14) 



i = l,...,N. 

Step 2. M-step: 



q 



i N 
-Yn 



N 



n 



argmax ]T(1 - f t ) log (c±(FpQ), G(YJ)) . 



i=l i=l 

Repeat Step 1 and Step 2 until (q, n) converges. 

The asymptotic variance of q is approximately evaluated as s/(N(3 2 ), 
where (3 and s are the sample variances of the pseudo-observations Ui and Vi 
defined by 



4 + c±(F(s i ),G(y i )) 



l-q + qc±(F(xi),G(yi)) 



Vi =Ui 



N 

q_ 

N 



{l-q + qct(F( Xj ),G( yj ))y 



^ {-l + c ±(F(i i ),G(y i ))}£c±(F(x i ) I G(i/ i )) 



j-yi<yj 



{i - q + q°n(F(xj), G(yj))y 



i = 1, . . . , N, respectively. 

In the case where both X and Y are discrete distributions, the E-step 
(114j) in Algorithm 12.31 is replaced with 



Ti 



'i - q)f(xi)g{yi 



;i - q)f(xi)g(vi) + q\ ELi fk:n( x i)9k-.n(yi) ' 
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(15) 



where f(x t ) = Ffa) - F(x { -), gfa) = G(y { ) - Gfa-), and f km (xi) and 
9k:n{yi) are given in ( 1T2"|) . If the joint distribution consists of both continu- 
ous and discrete variables, then their respective density fl3J) and probability 
function ()12p should be used. In the next section, we give examples of fitting 
model (JTBP. 



3. Illustrative examples 

3.1. Consomic mouse data 

In this section, we demonstrate how our algorithms work for real data 
analysis. The first data are measurements of blood concentrations of bio- 
chemical substances in mice. We apply Algorithm 12.21 for fitting Baker's 
distribution (jSJ) with continuous variables. 

The dataset consists of measurements of triglycerides (TG) and plasma 
high-density lipoprotein cholesterol (HDL) as plotted in Figure |2j TG and 
HDL are important indicators of metabolic syndrome and are correlated with 
the pathogenesis of cardiovascular disease in humans. To detect the genes 
responsible for adiposity, TG and HDL data are taken from consomic mouse 
strains of 314 10-week old females. A consomic strain is an artificial inbred 
strain with one specified chromosome replaced by another chromosome from 



a different inbred strain ( iTakada. et all . 120121 ). For example, B6-Chr4MSM 



appearing in Figure H] means a consomic strain that has all chromosomes 
from the mouse strain C57BL/6 (B6) except for chromosome 4, which is 
from the mouse strain MSM/Ms (MSM). The data are taken from 30 kinds 
of consomic strains including pure strains B6 and MSM, and hence are ve ry 



heterogeneous. This dataset is available from lTakada and Shiroishil ( 120121 ). 

Using the Gaussian kernel estimator, we first estimate the marginal den- 
sity functio ns. The bandwid ths are selected according to Silverman's "rule 



of thumb" ( iSilvermanl . Il986l ). As described in Section I2.1[ we use the em- 
pirical distribution functions to approximate the (cumulative) distribution 
functions. The estimated marginal densities and distribution functions are 
respectively shown in the left and right panels of Figure [TJ Subsequently, we 
estimate the Bernstein copula density with the EM algorithm (Algorithm 
12. 2p . In the algorithm, we estimate the matrix size of R by the Akaike infor- 
mation criterion (AIC). From Table (H we find that AIC attains its minimum 
5209.95 when (m,n) = (2,3). The corresponding estimate of R is obtained 
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Figure 1: Estimated marginals of TG and HDL. 

(Left: density functions. Right: cumulative distribution functions.) 



as 

g_ /0.333 0.106 0.061 
~ V0.000 0.227 0.273 

With this R, we estimate the density <Q as h(x,y;R). A contour plot of 
the estimated joint density is shown in Figure |2J 

3.2. Illinois state education data 

The second example is to estimate the joint density function of the Illinois 
Standards Achievement Test (ISAT) scores w hich are available from the web- 



site of the llllinois State Board of Education! . We use the ISAT performance 



results for reading and mathematics in grade 3 of = 2991 public schools 
and districts in 2009. For each school or district, percentages of students 
meeting or exceeding test standards are tabulated (see Table [2]). The data 
are plotted in Figure[3] (left). Each point indicates a public school or district. 

We first estimate the density functions and (cumulative) distribution 
functions approximately by the kernel method and the empirical distribu- 
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Table 1: AIC for female consomic mouse data. 
(The minimum AIC is indicated with a box.) 



m \ n 


1 


2 


3 


4 


5 


6 


8 


10 


1 

2 


5241.43 
5241.43 


5241.43 


5241.43 


5241.43 


5241.43 
5210.59 


5241.43 
5210.24 


5241.43 
5212.11 


5241.43 
5215.63 


5210.00 


5209.95 


5211.59 


3 


5241.43 


5211.99 


5211.38 


5213.66 


5214.90 


5217.07 


5223.38 


5229.96 


4 


5241.43 


5214.00 


5215.13 


5218.60 


5219.76 


5223.91 


5233.65 


5243.72 


5 


5241.43 


5214.81 


5217.95 


5223.10 


5226.30 


5231.55 


5245.62 


5259.34 


6 


5241.43 


5216.02 


5220.01 


5225.42 


5230.88 


5238.10 


5255.44 


5273.20 


8 


5241.43 


5218.20 


5224.89 


5233.21 


5241.56 


5252.91 


5277.31 


5302.13 


10 


5241.43 


5220.99 


5229.22 


5241.35 


5253.02 


5268.17 


5300.25 


5331.73 



Table 2: 2009 IS AT (Illinois Standards Achievement Test). 

The percentage of student scores meeting or exceeding standards in reading and mathe- 
matics, grade 3 for N = 2991 schools and districts. 



District name/ School name 


Reading 


Mathematics 


Payson CUSD 1 


78.6 


88.4 


Seymour Elementary School 


78.6 


88.4 


Liberty CUSD 2 


84.6 


100.0 


Liberty Elementary School 


84.6 


100.0 


Central CUSD 3 


63.6 


87.9 


Central 3-4 Middle School 


63.6 


87.9 


CUSD 4 


69.6 


71.4 


Greenfield Elementary School 


69.6 


71.4 


Quincy SD 172 


73.0 


86.9 


Adams Elementary School 


67.1 


79.7 


Dewey Elementary School 


75.4 


93.4 


Ellington Elementary School 


90.1 


94.4 
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Figure 2: TG and HDL data (female consomic mice) and estimated contour. 
(Dots: B6, Pluses: B6-Chr4MSM, Triangles: MSM, Circles: others.) 

tion function. Pearson's correlation and the sample rank correlation of the 
data are 0.853 and 0.851, respectively. Because of the high correlation, we 
use the largest correlation model H+(x,y) in (fT31 . The estimated density 
h + (x,y; (j,n) is plotted in Figure [3] (right). Using the EM algorithm in Sec- 
tion [23J we obtain the estimates (jq, n) = (0.919, 17). The approximated 
variance of q is 9.06 x 10~ 5 . The rank correlation under the estimated model 
is q(n-l)/(n + l) = 0.817. 

The analysis above assumes that the scores are continuous variables. 
However, as shown in Table [21 the scores are rounded off to the near- 
est one-tenth value. They are discrete variables taking values k/10, k = 
0, 1, . . . , 1000. Also, there are many ties in this dataset. The number of 
unique values for Xi, y^ and (xi,yi) are 602, 456 and 2260, respectively, 
among iV = 2991 schools and districts. Applying the EM algorithm for 
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40 60 
Reading 



40 60 
Reading 



Figure 3: ISAT percent meeting or exceeding standards. 
(Left: data plot. Right: estimated density contour.) 



the discrete case, that is, using Algorithm 12.31 with (III)) replaced by (fl5|) . 
we obtain the estimates (q~, n) = (0.933, 18). The rank correlation under the 
discrete model is 0.835, which is slightly closer to the sample rank correlation 
than the one under the continuous model. 

Appendix A. Proof of Proposition 12.11 

For vectors /x = (fik)i<k<m and A = (\i)i<i< n , define column vector 
valued functions by 

n _ j 

/(M; A) = (ifcO; A))i< fc<m , f k (fjL; A) = V — , 

^ + A; m 

and 

f k i 1 



Let l m = (1, . . . , 1)'. Each step of Algorithm 12 . 1 1 can be rewritten as follows: 
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Step MO. Set the initial values for /x^ = fjP. Let t = 0. 

Step Ml. For fixed fi {t \ let A w be the solution of g(X; /x (i) ) = 0. 

Step M2. For fixed A (i) , let £ (m) be the solution of / Qtx; A (t) ) = 
0. 

Step M3. Let 

m 

so that EfcO (m) )fc = Efc(M (0) )ifc- Let t := t + 1 and go to Step 
Ml, unless it converges. 

Let (/x*, A*) be a solution of g(X*; /x*) = 0, /(/x*; A*) = 0. Since (/x* + 
rl m , A* — rl n ) is also a solution for arbitrary r G 1, we assume without loss 
of generality that E fe (/x°)& = From Step Ml, 

O=0(A<V) 
=<7(A* ; /x*) + V A </(A*; M *)(A (i) - A*) + V^(A*; - 

and hence 

A W - A* = -(V A <7(A*; M *))- 1 V^(A*; M *)( M <*> - //*). (A.l) 

Here '=' means that the difference of left-hand side and right-hand side is of 
the order o(max(||/*W _ M *|| 7 ||A (t) - A*||)). Similarly, from Step M2, 

0=/(M (m) ;A«) 
=/(/**, A*) + V^/x*; A*)(/x (t+1) - A**) + V A /( M *; A*)(A« - A*), 

and hence 

— /x* = -(V m /(m*; A*))^ 1 V a /(/x*; A*)(A« - A*). (A.2) 
Because 1^,M (0) = E fe (M°)fc = = 4/ 1 *: Ste P M3 is rewritten as 

M (t+1) - /x* = - il m 4(M^ +1 ) - /x*) - /x* = J(^ +1 > - M*), (A.3) 

m 

where 

m 
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Combining flA.2j) and (1A.3[) . we have 



= - J(V M /( M *; A*))- 1 V a /(/x*; A*)(A« - A*). (A.4) 
Let C = (c w )n»xn, G = (diag(c fc+ )) mxm , H = (diag(c +i )) nxn , where 

n m 

Tkl 



c k + = ^2 c kh C + i 
1=1 



k=l 



-V A <7(A*;/x*) = tf, 



((m*), + (a*)0 2 ' 

Simple calculations yield 

-V M /(^*;A*)=G 

-V^(A*; tf) = C, -V A /(/x*; A*) = C. 
Therefore, (1A. If) and (1A.4|) are rewritten as 

A (t) - A* = -H~ l C\v {t) ~ A**), 
M (m) _ ^* = _jg^ 1 C(A (<) - A*). 

Combining (I A . 5 1) and flA.6j) . we have 

M (t+i) _ M * = JG- l CH- l C\n {t) ~ A**), 
A (t+i) _ A * = H- l C'JG~ l C(\ {t) - A*). 



(A.5) 
(A.6) 



(A.7) 
(A.8) 



To ascertain the asymptotic behavior of /Lt® and A^ as t — > oo, we need 
to find the eigenvalues of the matrices JG~ l CH~ l C and H~ l C JG~ X C, 
respectively ( Hageman and Yound . 1981 ). 

Let D = G~zCH~2 . We first show that matrix D has the largest singular 
value cxi(-D) = 1. This is because for column vectors u = (/Ufc)i<fc< m and 
v = {vi)i<i<m 



u'Cv = ^ u kVlC k l < U l°kl ^2 



vfcki 



'J2 u k ck + J2 v i c+i 



k,l 



k.l 
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and hence 

<Ji(D) = max u'Dv = max "S^ n^vi ° kl 

||u|| = ||«|| = l Enl=J2vf=l^ JC^C^i 



k,l 

max u k v t c kl 



k.l 



< max 



-2 , JJ2 u i c k+J2 v f c +i = L 



This upper bound is attained when <J\{D) = u'Dv with 

u = — GH m , v = — HH n , C++ = y^c ki , 

because u'u = v'v = 1 and 

u ' Dv = —(f m G l ?)G- l ?CH- l 2(Hh n ) = — l' m Cl n = 1. 

c ++ c ++ 

Therefore, DD' has the largest eigenvalue 1, and one of the corresponding 
eigenvectors is u. 

From the assumption that f k \ > 0, /, DD' is a positive matrix (i.e., all 
eleme nts are positive). By the Perron- Frobenius theorem (e.g-. lHorn and Johnson 
(11990I )). the multiplicity of the largest eigenvalue 1 of DD' is 1. That is, 

DD' = t m t' m G^ + V u k u k u' k , 

where 1 > > ■ ■ ■ > v m > and = u' k u = u' k G^i m / y/c ++ . 

Hence, the matrix JG~ 1 CH~ 1 C appearing in (1A.7I) is rewritten as 

JG^CH^C =JG-*DD'G* 

JG-^Gh m l' m G^G^ + JA = JA, (A.9) 



where 



c ++ 



A = G U^2^Uku' k jG^. 
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This matrix J A has the same nonzero eigenvalues as those of 

(m v . 

k=2 ' 171 

which has the eigenvalues u 2 , . . . , v m and 0. Here we used = u' k G^t m . More 
precisely, it holds that 

JAB = BN, B= (l m , JG-3(« 2 ,...,tO), 
where AT = diag(0, ^2, • • • , ^ m )- Matrix 5 is nonsingular, because 
Gififl ^CG-^ 2 ,...,u m )\ = (Ghm;n2; ._ Um) 

and G^t m span {it 2 , • • • , u m }. Therefore, 

JA = BNB~ 1 . (A.10) 

Combining flA~T0|) with f lA~7j) and flA~9|) . we have B^ifi^-fi*) = NB' 1 ^- 
fj,*), and hence for arbitrary e > 

||irV m) - < (1* + e) ||irV*) - (A.ll) 

when i is large enough. 

Similarly the matrix H~ 1 C'JG~ 1 C in ( 1A.8I) is shown to be diagonalized 
with the same eigenvalues 1 > u 2 > ■ ■ ■ > v m m(m,n) > and (if m < n). 
Hence, for the sequence \^\ the inequality of the same type as (I A .111) holds. 
These inequ alities imply the linear convergen ces of fi^ and A® with the rate 
z/2 + e (e.g., iDennis Jr. and Schnabel (119961 ) ). 
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